An approach to similarity measurement of absence-presence data: the case that common zeros matter
نویسندگان
چکیده
Similarity between objects (documents, persons, answers to a questionnaire, etc.) is generally determined through relations between representations of these objects. In the case of binary representations the presence of a properly (e.g., an index term) carries a weight of one, the absence a weight of zero. In many similarity studies common zeros are ignored. This situation is called the zero insensitive case. In this article, however, we study the zero sensitive case. Clearly, answers to binary questionnaires (yes-no, encoded as 1-0) are zero sensitive, as people who answer 'no' to the same questions are more similar. We present a wish list for such a zero sensitive approach to similarity. Making a difference between common zeros and common ones leads to an 'identitysimilarity' theory. Hence, we move beyond a pure similarity theory. Three approaches to the problem of similarity measurement of presence-absence data, where common zeros matter are presented. In each case a coding approach is used, leading to new representations, which then lead to a similarity ranking. Examples of functions respecting these rankings are given.
منابع مشابه
Performance Measurement of Decision Making Units with Network Structure in the Presence of Undesirable Output
In the performance evaluation process, using the classic data envelopment analysis (DEA) models, decision making units (DMUs) are considered as black boxes. While in many cases and different applications such as investment funds, banks, insurance companies, etc., DMUs have a network structure. In addition, in many network structures, some of the indicators used to calculate the efficiency...
متن کاملPerformance analysis in production processes in the presence of fixed-sum outputs
Performance measurement in the presence of fixed-sum outputs in data envelopment analysis (DEA) is an interesting and most frequently studied subject in the field of operations research. Different definitions of relative efficiency in the presence of fixed-sum outputs have been proposed in the literature of data envelopment analysis and in all of the existing definitions a common equilibrium ef...
متن کاملDeriving Common Set of Weights in the Presence of the Undesirable Inputs: A DEA based Approach
Data Envelopment Analysis (DEA) as a non-parametric method for efficiency measurement allows decision making units (DMUs) to select the most advantageous weight factors in order to maximize their efficiency scores. In most practical applications of DEA presented in the literature, the presented models assume that all inputs are fully desirable. However, in many real situations undesirable inpu...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملTechnical Note: Performance measurement in industrial organizations, case study: Zarbal Complex
Industrial organizations are complex systems` where the interactions among the various functions such as Sales, Distribution, Manufacturing, Materials, Finance, Human Resources and Maintenance have to be man-aged towards a common purpose of delivering the customers satisfaction. However, since most of these or-ganizations have a `Functional Structure`, each function or department works towards ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Information Science
دوره 30 شماره
صفحات -
تاریخ انتشار 2004